SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs

نویسندگان

  • Abhishek Verma
  • Ludmila Cherkasova
  • Roy H. Campbell
چکیده

( LADIS'2011), held in conjunction with VLDB'2011, Seattle, Washington, Sept. 2-3, 2011.  SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell HP Laboratories HPL-2011-126 MapReduce; Hadoop; performance models; completion time prediction; resource allocation There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that need to be completed within a given time window. Currently, there is a lack of performance models and workload analysis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we propose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger dataset. The job profile (with scaling factors) forms the basis of a MapReduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a MapReduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications on the 66-node Hadoop cluster. External Posting Date: August 21, 2011 [Fulltext] Approved for External Publication Internal Posting Date: August 21, 2011 [Fulltext] Copyright 2011 Hewlett-Packard Development Company, L.P. SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs∗ Abhishek Verma University of Illinois at Urbana-Champaign Urbana, IL, US. [email protected] Ludmila Cherkasova Hewlett-Packard Labs Palo Alto, CA, US. [email protected] Roy H. Campbell University of Illinois at Urbana-Champaign Urbana, IL, US. [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Provisioning Framework for MapReduce Jobs with Performance Goals

Many companies are increasingly usingMapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments ...

متن کامل

Big Data Using Hadoop

17ANSP-BD-001 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementationof MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as AmazonEC2 Cloud offer the opportunities for Hadoop users to lease a certain amou...

متن کامل

Towards Optimizing Hadoop Provisioning in the Cloud

Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...

متن کامل

Dynamically Scheduling a Component-Based Framework in Clusters

In many clusters and datacenters, application frameworks are used that offer programming models such as Dryad and MapReduce, and jobs submitted to the clusters or datacenters may be targeted at specific instances of these frameworks, for example because of the presence of certain data. An important question that then arises is how to allocate resources to framework instances that may have highl...

متن کامل

QoS-Based Pricing and Scheduling of Batch Jobs in OpenStack Clouds

The current Cloud infrastructure services (IaaS) market employs a resource-based selling model: customers rent nodes from the provider and pay per-node per-unit-time. This selling model places the burden upon customers to predict their job resource requirements and durations. Inaccurate prediction by customers can result in over-provisioning of resources, or under-provisioning and poor job perf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011